"DMWS-MTA SZTAKI"
Data Mining and Web search Research Group
Computer and Automation Research Institute, Hungarian
Academy of Sciences

VAST 2010 Challenge
Hospitalization Records -  Characterization of Pandemic Spread

Authors and Affiliations:

Eszter Friedman, MTA SZTAKI, feszter@info.ilab.sztaki.hu
Julianna Göbölös-Szabó, MTA SZTAKI, gobolos.szabo.julianna@gmail.com
Adrien Szabó, MTA SZTAKI, adrienn.szabo4@gmail.com,
[PRIMARY contact]
András Lukács, MTA SZTAKI, alukacs@sztaki.hu

Tool(s):

The Epidemic Outbreak Visualizer is able to plot the changes of the amount of diseased people in time even filtered by special features like symptoms, age or gender. The tool is able to record unexpected or unusual events or trends which can help in recongnizing a new epidemic outbreak. The EOV can provide help in comparing outbreaks in different cities. The tool was implemented for this contest by Adrienn Szabó in a short week.

Video:

Video


MC2.1: Analyze the records you have been given to characterize the spread of the disease.  You should take into consideration symptoms of the disease, mortality rates, temporal patterns of the onset, peak and recovery of the disease.  Health officials hope that whatever tools are developed to analyze this data might be available for the next epidemic outbreak.  They are looking for visualization tools that will save them analysis time so they can react quickly.

First we transformed hospital records into simple time series data. First, we only considered frequent symptoms (as abdominal pain, back pain, diarrhea, head ache, head bleeding, cough, fever, rash, etc.), genders and three clusters of ages (under 20 years, between 20-59 years and over 60). Next, we aggregated these and all pairs of these into daily data in which we counted the number of people with these features on a given day and we wanted to observe the changes of these amounts in time. (Generating all these input files took about 1 hour on a strong commodity PC.)

In order to be able to compare the changes of different symptoms, we wanted to monitor how amounts differ from the number of patients of a normal, “epidemic-free” period. Therefore we computed the mean and variance of the number of diseased people in the first 5 and the last 20 days of the time series data and we created an index to each day showing how great the difference of the patient number of the given day is from the expected value. If this index has a large positive value on several consecutive days it implies that probably an epidemic is breaking out, thus the observation of this value can help health officials to react as soon as possible. It is worth noting that this index value can have an accidental jump without an epidemic, because the number of patients can be seen as a random variable, but it is very unlikely that the value of the index remains high for more than 2-3 consecutive days without an epidemic. The tool is also able to sign the unusual growth of the number of deceased people even filtered by symptoms or by other features as mentioned above.

These time series are plotted in a table and the index of “probability” is marked with different colors as in a heat map. If more people than expected are admitted to a hospital then the symptom gets a more reddish color, if it is around the expected daily value, it is shown in green, and blue means a patient number under expectations. The visualizer is able to highlight cases when the number of deaths related to some of the considered symptoms is above a threshold (with a thicker border around the cell). We have to add that in some cases the sample we used to compute the mean and variance of the normal period was so little that any little growth in the number of patients could cause a warning (a red cell). To avoid such situations we down-scaled these noisy data. In order to check a suspicious symptom or symptom-pair, the user can click with the left mouse button on any cell of the table to see a chart about the number of admitted patients that have the selected symptom(-pair ) in the whole time period. Clicking with the right button brings up a similar chart about deceased patients. This can help judging the severity of the situation and gives a more complex picture of it.

On the left there is the heat map of Aleppo showing the state of the epidemic on 10th May. On the right we can see the tendency of the whole period considering only patients with back pain and abdominal pain. The graph above shows the number of people admitted to hospital with the respective symptom, the other graph represents the number of fatal cases.

With this tool we are able to recognize the epidemic in Aleppo, Colombia, Iran, Karachi, Yemen, Lebanon, Nairobi, Saudi Arabia and Venezuela. Turkey and Thailand escaped the pandemic.


MC2.2:  Compare the outbreak across cities.  Factors to consider include timing of outbreaks, numbers of people infected and recovery ability of the individual cities.  Identify any anomalies you found.

Having analyzed the hospital records we found that there was no epidemic outbreak in Turkey and Thailand. In other cities the typical symptoms are the following:

·         Aleppo: abdominal pain, back pain, diarrhea, fever, head pain and vomiting

·         Colombia: abdominal pain, back pain, vomiting, diarrhea and fever

·         Iran: abdominal pain, back pain, diarrhea and vomiting

·         Karachi: abdominal pain, back pain, diarrhea, fever, head pain and vomiting,

·         Lebanon: abdominal pain, diarrhea and vomiting

·         Nairobi: abdominal pain, back pain, diarrhea, fever, head pain and vomiting

·         Saudi Arabia: abdominal pain, back pain, diarrhea and vomiting

·         Venezuela: abdominal pain, back pain, vomiting and diarrhea

·         Yemen: abdominal pain, back pain, vomiting, diarrhea and fever

We can recognize that the countries can be sorted into groups regarding symptoms:

1.      In Lebanon, Venezuela and Iran the main symptoms are abdominal pain, back pain, vomiting, diarrhea. Vomiting is the most frequent symptom and it is often combined with the others.

2.      In Colombia and Yemen fever is a new symptom, and like vomiting, it occurs in pair with the rest of the frequent symptoms.

3.      The epidemic is the most severe in Aleppo, Karachi and Nairobi. A possible atypical (less frequent) individual symptom can be nose bleeding, which is as frequent as in the previous group, but in contrast to those countries, here it is accompanied with a higher mortality rate.

The order of the outbreaks is the following:

 

Time of Outbreak

Peak

Recovery

Duration

Nairobi

04.28.09

Between 05.10 and 05.20

05.28.09

30 days

Karachi

04.30.09

Between 05.11 and 05.20

06.06.09

40 days

Iran

05.01.09

Between 05.16 and 05.23

06.06.09

37 days

Aleppo

05.02.09

Between 05.10 and 05.20

05.30.09

29 days

Venezuela

05.02.09

Between 05.15 and 05.23

05.30.09

28 days

Lebanon

05.03.09

Between 05.14 and 05.21

05.30.09

27 days

Saudi Arabia

05.04.09

Between 05.15 and 05.22

06.01.09

27 days

Yemen

05.04.09

Between 05.12 and 05.21

06.03.09

29 days

Colombia

05.05.09

Between 05.16 and 05.25

06.01.10

25 days

The amount of deceased people seems to correlate with the intensity of the epidemic. In Aleppo, Karachi, Nairobi and Yemen there was a huge jump in the number of fatal cases during the peak of the epidemic. In Saudi Arabia, Venezuela and Colombia there was a notable growth too, while in other countries like Lebanon and Iran the amount of deceased patients changed only slightly, probably due to the weakness of the local epidemic.

This screenshot shows the number of dead patients in Aleppo and Lebanon with the symptom vomiting, which was found to be one of the key symptoms of the epidemic. We know that these two countries have quite different epidemic characteristics. In both countries, the normal daily number of deceased people is about 2, but during the peak of the epidemic, the number of similar cases reaches 1800 in Aleppo, while this value stays around 150 in Lebanon. This difference of the intensity can be seen in the tables: the cell of the 7th row and 7th column has thicker borders in Aleppo than in Lebanon, but both borders are more significant than in normal cases.